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Abstract 

In order to evaluate standard setting procedures, apart from the more 
commonly applied approachs of simply compariiiii the derived standards or fail- 
ure rates across various techniques, this study investigated the errors o£ 
classification associatp.d vd.th the contrasting groups procedures. Monte 
Carlo sLinulatlons were employed to produce masters/nonmasters score distribu- 
tions sampled from normal and left-skewed parent score distribution popula- 
LLons. In addition, three levels of score distribution overlap (noise) 
between the mas ter/nomna's ter subpopulations were simulated to examine the 
effects of this phenomenon on errors of classification. 

\ 
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A Comparison of Approaches for Setting Proficiency Standards 

via 

Monte Carlo Simulations 

ScliDol districts, and a variety of , other agencies , faced with the re- 
sponsLbiUty of establishing testing standards leading to the identifica- 
tion of acceptable levels of proficiency are faced not only with the dilemma 
of deciding which standard setting procedure to employ but are also con- 
fronted with the issue of subjectivity often associated with -the believa- 
blLity of the results generated by the chosen 'technique(s) . Current prac- 
tices associated with the setting of standards. In educational testing can be 
broadly placed into one of three categorTes: (1) comparisons with the 
performance of others, i.e.. the nonnative approach; (2) considerations of 
the consequences of mi sclassif ication , such as the borderline or'contrasting 
^itoups techniques; (3) examination of item content, such as the Nedelsky 
(1954), Ebel (1972), or Angoff (1971) procedures. 

Investigations of the variety of standard setting techniques have been 
limited to comparisons of generated passing scores and/or the number of. 
. Hidivlduals failing for the procedures studied. Andrew and Hecht (1976), and 
3choon, et al . (1970) compared the standards generated by the Nedelsky and 
Ebel techniques; Skakun and Kling (1980) investigated the comparability of 
passing scores derived from the Nedelsky, Ebel, and a modification of the 
Ebel .procedure; Poggio, et al . (1981) concentrated on the Angoff, Ebel.. 
Nedelsky and contrasting groups procedures; Koffler (1980) compared the 
obtained standards from the Nedelsky and the contrasting groups technique; 



Saunders, at al . (19B0) studied the scores .{generated by two versions of the 
Nedelsky approcich while Brennan and Lockwood (L979) investigated the vari- 
ability of pasiilr^ scores generated by the Anj^oEf and Nedelsky procedures. 
The one notable outcome of these investigations is that: different approaches 
Tor establishing a standard produce different standards, 

Althougli the studies noted were conducted under a variety of conditions, 
tlie conclusions are restricted to one-tiiuu comparisons among the populations 
niid approaches employed. In addition, none of the previously noted studies 
laves ttgated Che errors of classif ical \on associated with the derived stan- 
dards or the stability of the estimates both within and across varying tech- 
nlc/ues. Furthermore, all of these studies principally focused upon the class 
of standard sotting procedures related to the examination of item content and 
the probabilities associated with passii"ig a given item, procedures which 
'''sclwol district personnel are less accustomed to as compared to judgments 
made about a student's level of overall performance on a test. 

This study employs Monte Carlo simulations to examine the properties of 
standards derived from the. contrasting groups technique. In addition, these 
standards are compared, on the basis of errors of classification and stabil- 
ity, to the estimates of standards derived from three preselected, procedures 
based upon theory and empirical evidence. Stability, for purposes of this 
Investigation, was studied by simulating pairs of masters and nonmasters sub- 
populations, randomly generated from a normal and negatively skewed parent 
population, respectively. The standard associated with the minimum number of 
misclassif ications from the first member of the pair was used as the standard 
Cor the second paired simulation, and l:he corresponding errors of classifi- 
cation tabulated. In addition, three predetermined levels of noise (degree 



<)l gamplo,clLsurLbiitlon overlap bL'tween the masters and nomuisters subpopii- 
L.itlons) wore Bimultited in order to study this phenomenon's effects on the 
stability of errors of classification. 

Background 

Tht! decision by a school district to employ the contrasting groups 
procedure as a standard setting technique for a competency testing program, 
is reasonably based upon two considerations: (1) teachers are more accus- 
tomed to judging the overall adequacy of student achievement than to guessing 
the probabilities of a student's success on a given item; and (2) the con- 
trasting groups method provides a direct assessment of errors of classifi- 
citlon a:iso elated with a given score (Zleky and Livingston, 1977). As noted 
by Zleky and Livingston in their manual, Methods for Setting Standa rd on 
Criterion-Referenced Tests of Basic Skills , "the Idea behind the contrasting 
groups method Is to set a standard at the test score level that best sepa- 
rates the students judged to be masters from the students judged to be 
nonmasters on the objectives measured by, the test." 

A sample of teachers, serving as judges, are Instructed to identify 
several students In their classes whom they are certain are either definite 
.masters or nonraasters of the skills measured by the test on which a passing 
score is being set. Once this process Is completed, the test Is administered 
to tlie population of examinees and the scores for the previously Identified 
groups of masters and nonmasters are examined to determine the standard 
■ .ninimizins the number of errors of classification. Two types of error are 
associated with .this procedure: (1) classifying as master a student who has 



!U)t ndeqaatoly mastered the objectives (false master, Type I error); (2) 
c lass Lf yLiiii art a noiiraaster a student who has adequately mastered the 
object LvcH (false nonniaster, Type II error). Raising the standard reduces 
the numbfjr of Type, I errors while increasing the number of lype II errors. 
Lowering the standard produces the opposite results. 

The contrasting groups method employed by the school district providing 

/ 

the . euiplrical test data for this study, was utilized over a three year period 
to set standards for reading and mathematics competency tests. Two salient 
trends^ became apparent over this period. First, for both the population of 
students tested and the subgroups defining the masters and nonmasters, the 
distribution of scores for the reading competency test exhibited significant 
negative skewness, while the distribution of scores on the mathematics tests 
approximated a normal distribution.^ .Second, the degree of overlap between 
the groups of student masters and nonmasters was consistently greater for the 
readily than the mathematics competency test. These emplrical^conditions 
I)rovided the framework vrithin which the Monte Carlo simulation was pursued. 

Monte Carlo Simulation 



The Ahrens and Dieter algorithm for beta parameters (Ahrens and Dieter, 

1974) was us6d to simulate normal and negatively skewed population distribu- 

■ - \ 
tions vath a raw score range of 1 to 100. For the normal distribution, (a 

(I 

and were set at 10, resulting in a population mean of 49.59 and a standard 
deviation of 6.82. To generate the negatively skewed distribution, o( was set 
at 10 and was set at 2 representing a highly negatively skewed distribu- 
tion modeling the empirical data for the reading tests. This distribution 
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had a menu oC 87.02 and a sLahdard deviation of 6.16. Each distribution 
c!onsLsted of 2450 nonzero values representlni^ the average number of students 
withiu the school district takii^ either the mathematics or reading compe- 
tency test. 

A Statistical Analysis Syctem (SAS) program was written to generate 
samples of masters and nonmasters subpopulations from each simulated parent 
population. SAS's uniform distribution function was used to randomly sample 
scores from the tails (greater than + one standard deviation from the popu- 
lation mean) and middle range (within + one standard deviation from the mean) 
of the two populations. Paralleling the recommendations of Zieky and 
Livingston (1977), the total number of scores comprising the masters and 
nonmasters samples was maintained at greater than 100 observations per group, 
respectively. 

Sampling from the middle portion of each parent population represented 
the masters/nonmasters score distributions overlap, noise. Table 1 presents 
the proportions used to sample from the high and low score tails of each 
parent population, as well as the proportion and range of the number of 
cases falling within the overlap region associated with the three noise 
levels. As an example, refer to Table 1, normal distribution, low noise. 
Twenty percent of the scores, one standard deviation above and below the 
mean, were randomly sampled from the overall population of scores and allo- 
cated to the masters and nonmasters groups respectively. Of the scores lying 
within plus-or-minus one standard deviation from the mean, two and one-half 
percent were sampled and randomly assigned to either the masters or nonmas- 
ters group. Varying the percentage sampled from the middle portion of the 
parent population served to define the three noise levels. 
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Table I 

Predetermined Proportions Used to Generate the Masters 
and Noninasters Subpopulations by Degree of Sample Overlap 



Distribution 



Normal 



Left Skewed 



No I so 
Level 



Low Middle 
Tail 



High Low Middle High Noise 

Tail Tail Tail Level 



Low 



Middle [4 3] 



iiv^h [44] 



.20 .025 ^ .20 

(18-^6)2 
.16 -050 .16 

(49-99) ■ 
.14 .075 .14 

(82-124) ^ 



.145 .025 .240 [29] Low 
(21-39) 

.110 .050 .195 [30] Middle 
(45-73) 

.090 .075 .170 [30] Hi^jh 
(67-100) 



1 Figures in brackets represent the number of simulated pairs generated at 
each noise level. 

2 Figures in parenthesis reflect the range of cases falling in the areas 
oE overlap' between the master and nonmaster subpopulations. 

■ ( 
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Stiandard Setting Procedures 



For uUe conuraating i^roups procedure the score that resulted in the 



mLnimum number of errors of classification was considered to be the optimal 
fiUandard for a given sample^. For purposes of this study, the errors of 
classification were tabulated by counting the number of scores in the masters 
dLst> n)iition which fell below the derived standard and adding this result to 
the number of scores from the nonmasters group which fell above the standard • 
This total was subsequently divided by the total number of scores within the 
overlap region of the two groups and designated the error rate. This was 
done to standardize the errors of classification across samples and noise 
levels for analysis purposes. 

Alternative standard setting strategies included in this study and, 
thus, providing a basis of comparison for the contrasting groups technique, 
were: (I) the linear discriminant function (LDF) applied to the normal • 
dLstribuUlon, and defined Ws : t 



scores respectively, is the pooled variance, and Z the test score to be 
classified; (2) the quadratic discriminant function (QDF) for the left skewed 
distribution, defined as: 



[Xi - X2)/S2] [2 (Xi + X2)/2] 



(1) 



\ 



where Yi and X2 are the sample means of the masters 



' and nonmasters' test 



Z[Xi/S^X2/s|]-z2/2[l/sf-l/s|]-l/2[xf/s5-x|/s|]+l/2LNls|/sf] 



(2) 
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wlu»rc' lC| , Si and X 2 , aru Lho moariH and varlancet) of Lho maatnra' and 
noiuiiaMLurs' LuiiL acoruH, reapucL ively , and Z la the LohL acore to be 
cLafiHlf Uul; (:)) a third aoluLlon, referred Lo aa an "empirical aolutlon/' 
a};Hiinu!d vncU sample of masters' aiid noiunaaters * acores wao nonnaliy 
dlntrlbnLed, renardLess of the distributional form of the parent population; 
and ('O c»aeh parent population's respective mean. 

/xhe \AW Is sugijesied as an appropriate technique Lo utilize when the 
populations of masters' and nonmasters' test scores are aonnally distributed 
with equal hut unknown variances and means; whereas, the QDF applied to the 
ranks of the raw scores is a recommended approach for skewed distributions 
(Conover and Imaa, 1978). For both the LDPV and QDF ; ulures, the standard 
which minimized the probability of misclassif ication the smallest score, 
such that either equation was greater than the constant: 



LN (C12Q2/C21Q1) (3) 

where Cij is the cost (either monetary, psychological or a combination) of 

isciassifying an observation belonging to population j into population i 
(i,j = l',^), and (i = 1,2) is the prior probability of group membership 
(Anderson, 1951). In this study C]2 and C21 were assumed equal. The 
proportions of cases in the sample determining the masters (Q^) and 
nunmasters (Q2) were used as estimates of group membership. (See Koffler, 
1980, for a discussion of the QDF and LDF procedures.) 

The third standard setting procedure employed for comparison purposes, 
and referred to as.t^e empirical solution, involved the simple equating of 



til 
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two nomal density functions yielding Equation 2 (Lachenbruch, et al., 1973). 
Sample values for the means and standard deviations for the masters and 
nonmasters simulated distributions "are substituted into the equation as noted 
previously. The two major differences between this procedure and the LDF/QDF 
strategies is that the empirical soly^^on was used to determine a standard 
for the masters/nonraasters samples, regardless of the distributional form of 
the parent populatio,ns ; and, secondly, the solution did not .employ Equa'tion*^ 
3. Empirical evidence gathered from the results of the administration of. the 
reading and mathematics minimum competency te^sts within the school district 
sugges ted/^Ka^r-^^Tfes"^^ mean was a reasonable approximation to the 



contrasting groups standa1?dJ Hence, the etrors of classif icati^ associated , 
with the population mean for each sample were, tabulated and included for ' 

comparison. \ i' . 

Tables 2 and 3 present the descriptive results' of the Monte Carlo ^paired 
simulations forthe master/ nonmaster samples generated from the parent normal 
and skewed^ distribution, by noise level and technique respectively. 



Analysis and Findings 



A repeated measures design was used to compare the various standard 

\ ** 
setting procedures (P), the effect of noise level (N), and the repeated 

measure (R), for each population. The dependent variable was the paired 

errors of classification. TabO^ 4a and Abj, 5a and 5b present the ANOVA 

results and descriptive statistics^for^ the normal and left skewed 

simulations, respectively. * " . 

\ 
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' - \ Table 2 ,/ 

Descriptive Statistics of the Monte Carlo Simulation 
^ for the Normal Population Distribution 



Noise 
Level 



Paired 
Simulation 



Error 
Rates 



Technique^ 



II - 



III 



IV 



/ , 

/ 

/ 

/first 



Low 



Seco nd 



Range 
\Mean 
SD 

Range 

Mean 

SD 



.30-. 72 
.42 
.08 

.30-. 64 
.46 
.12 



.30-. 83 
.49 
.10 

.27-. 65 
.47 

.08 ^ 



.29-. 78 
.49 
.09 

.30-. 65 
.47 

•.08 



.30-. 7 8 
.51 
.10 

.27-. 63 

.47 . 
.08 



Flcst 



Medium 



Seco nd 



Range 

Mean 

SD 

Range 
Mean 
SD ' 



.29-. 51 
.41 

• 05 , 



.36-. 66 
.49 
.06 



.31-. 56 
.44 
.06 

. 38- . 62 
-.49 
.06 



,31-. 57 
.45 
.07 

.36-. 66 
.49 
..06 



.29-. 60 
.45 
.07 

.36-. 60 
.48 
.06 



y First 



High , 



Second 



Range 

Mean 

SD 

Range 

Mean 

SD 



.18-. 52 
.43 
.06 

.36-. 60 
.47 
.05 



.32-. 59 
.48 
.05 

.38-. 55 
.46 
.05 



.36-. 60 
• 49 
.05 

.38-. 57 
.46 
.04 . 



.36-. 60 

.05 

.25-. 57 
.46 
.06 



a I equals contrasting groups; II equals LDF; III equals erapirical 
solution; IV equals population mean. 



12 

\ r Table 3 

Desctiptive Statistics of the Monte Carlo Simulation 
I for the Left Skewed Population Distribution 

/ 









Technique^ 




Noise Paired 
Level Simulation 


Error 
Rates 


I 


II 


III 


IV - 


First 


V 

Range 
Mean - 

sp 


.23-. 47 
.36 
.06 


.28-. 56 
.41 
.08 


.27-. 67 
.44 

.08 .-. 


23-. 67 . 
'.45 
.09 


Low 

Second 
First 


Range 

Mean 

SD 

Range 

Mean 

SD 


.45 
. .08 

.29-. 48 
.40 
.05 


.30-. 62 

.44 ] 

, -"^09" - 

■ 1 

.31-. 57 ; 
.45 i 
.07 


I 
I 

.27-. 52 

.07 

.35-. 63 
.48 
.07 


28-. 58^ 

^:U5 

.08 

.33-. 63 

.47 
' .07 


Medium 

Second 


Range 

Mean 

SD 


^ .31-. 85 
.47 
.11 


1 i 

.25-. 70 
.45 
.09 


.30-. 85 
.47 
.11 


.30-. 85 
.47 
.17 


i 

First 


Range 

Mean 

SD 


.34-. 53 
.42 
.04 


.34-. 56 
.46 
.05 


.37-. 61 
.50 
.05 


.37-. 60 
.49 
.05 


High 

Second 


Range 
Mean 

SD: 


.30-. 59 
.44 
.06 


.31-. 59 . 
.43 
.05 


.36-^.60 

Us 

'.06 


.37-. 56 
.44 
.05 


a I equals contrasting groups; II equals 
solution; IV equals population mean. 


QDF; III equals empirical 
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Table 4a 

Repeated Measures ANOVA of Errors of 
Classification for the Normal Distribution 



Source of Variation 



Be tween 

Noi se (N) 
Procedure (P) 
N X P 
Res id ual 



df 



2 

3 . 
6 
500 



ms 



.008 
.091 
.009 . 
.010 ■ 



.8 
9.10* 
.90 

' \ 



Within 

Repeated Measure (R) 

R X N 

R X P 

R X N X P 

Res id ual 



1 
2 
3 
6 

1000 



.016 

.116 

.053 

.0015 

.0022 



7.27* 
52.73* 
2 4. .09* 
.68 



* p < .01 



/ 



Table 4b 

Means and Standard. Deviations of Error Rates for 
the No"rraal Distribution Repeated Measures ANOVA 



.crrr^p"""^ Procedure 



Repeated 




Contrasting 


Di scrim incnt 


Empirical 


Population 


Measure 




Group 


Function 


Solution 


Mean 




Mean 




.42 


.48 


.48 


• . .48 




SD 




.06 




\ • u / 

/ 


Oft 


2 


Mean 




.48 


.48 - ^ 


.48 


.47 




SD 




.07' 


.06 


.06 


.07 




Repeated 
Measure 






Noise 








Low 


Medium 


High 




1 




Mean 


.48 


.44 


.48 








SD 


.10 


.06 


.06 




2 




Mean 


.47 


.49 


.46 



SD '■ .08 .06 .05 
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Table 5a 

Repeated Measures ANOVA of Errors of. 
Classification for the Left Skewed Distribution 



Source of Variation 



df 



ms 



A 



Between 

Noise (N) 
Procedure (P) 
N X P 



\ Residual 



2 
3 

6 

344' 



.115 
.134 
.002 
.012 



9.58* 
11.17* 
.17 



Within 



Reijeated Measure (R) 


1 


.008 


. 2.00 


R X N 


2 


.065 - 


16.25* 


R X P 


3 


,. .057 


. 14.25* 


R X N X P 


6 


.0005 


.125 


Residual 


688 


- .004 





* p < .01 
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Table 5b , 

Means and Standard Deviations of Error Rates for 
the lleft Skewed Distribution Repeated Measure ANOVA 



Repeated 
Measure 



\ 



Procedure 



ntrasting 



Group 



Discriminant ^ Empirical Population 
Function Solution ^ Mean 





/ 












I 


Mean 


.39 




.44 


.47 


.47 

\ .07 

\ 

\ 

\ 




SD 


.06 




.07 


.08 


2 


Mean 


.4 5 




.44 


.45 


.45 




SD ' 


.09 




.08 


.08 


* ■ / > 




Repeated 


/' 






Noise 


< 




Measure/ 


/ • 




Low 


Medium 


High 




/• 

1 -J 




Me an 


.41 


.44 


.47 








SD 


.09 


' .07 


.06 




2 




Mean 


,.44- 


.46 


.44 








SD 


;o^ 


.10 


, '', ' .06 



1 



ERIC 
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For the normal simulation,, a signific,ant main effect resulted for pro- 
cedure (P) and repeated measure > while significant interactions appeared 
for repeated measure by noise level (R x N) and repeated measure by procedure 
(R X P). Referring to Table Ab, the R x P interaction occurring for the 
contrasting groups procedure shows an increase in the average errors of 
classification over repeated measures while for the three remaining proce- 
duries the average errors of classification remain quite stable. Across 
repeated samplings, the contrasting groups procedure does produce, rela- 
tively, a lower average error rate. The fact that the main effect, noise 
level, is not significant suggests that, in the case of the normal distri- 
bution, the amount of master/nonmas ter overlap has little influence on 
errors of classification. However, the R x N interaction reveals the insta- 
bility of error, especially at the medium noise level, when examined over 
repeat;ed samplings. 

Referring to Tables -5a and 5b, for the lef t skewed population simulation 
results, both, main effects, noise and procedure, are significant > with 
significant interactions occurring for R x N, and R x P. The signif icatace of 
P is due to the lower average error rate, across repeated samples, for the 
contrasting groups technique. While the error rates for the three comparison 
procedures remain relatively stable', the R x P interaction is due to the 
instability of the errors, across repeated samplings, for the contrasting 
groups technique. Interestingly, although N was not significant for the 
normal simulations, it is significant for the left skewed population, with 
the lowest average error rate, across repeated measure^^, occurring at the low 
noise level. Likewise, the repeated measures factor, although significant 
for the normal simulations and suggesting error rate instability, is not 
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significant for the left skewed simulations. The explanation of the sig- 
nificance of the R X N and R x P interactions for the left skewed simulations 
is due to the significance of the difference for N and P, favoring the low 
(loise level and the contrasting groups procedure; while the R x N and R x P 
interactions reported for the normal distributions are due to R and P, insta- 
bility of the error rates at the medium noise level and a low overall average 
error rate for the contrasting groups technique. Regardless of the popula- 
tion simulated, procedure or noise levels, all error rates are high and 
approaching the chance level • 

Discussion 

In their review of minimum competency testipg and the accompanying 
standard setting problem, Linn, et al • (1982) concluded that "there is no 
jjood basis for judging one procedure for setting the passing score superior 
to another." This statement was based upon a comparison of the differences 
among the derived passing scores and the varying number of student failures 
for the standard setting procedures investigated. Our investigation ap- 
proached this apparent dilemma by assuming that a reasonable meth3d for 
judging the superiority of a standard setting procedure was to investigate 
the errors of classification associated with the techniques selected for this 
study. Regardless of the resulting standard, it seems apparent that the pro- 
cedure with an "acceptably low" misclassif ication error rate would be the 
raost appealing strategy. 

All standard setting procedures require an investment in time on behalf 
of expert judges and other personnel. Notwithstanding the concerns plaguing" 
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school administrators about the availability of monies to support remediation 
programs, more consideration should be given to the "accuracy" of a given 
technique. Based upon the experiences of the school district participating 
In this study and Educational Testing Service vd.th minimum competency tests, 
this study attempted to fill an information void in this area. 

It is clear from the results that regardless of the standard setting 
procedure and/or level of overlap, the misclassification rates are extremely 
high and approaching the 50 percent chance level in many instances. From 
both a psychometric and administrative viewpoint, it is apparent that deter- 
mining competency (and the eventual allocation of funds 'for remediation) on 

... ' : . / 

the basis of a one time administration of a minimum competency >test is a j 
highly risky undertaking. Thus, Linn's concern about different judges and/or 

different staalard setting procedures producing different standards is / 

/'■■ 

accompanied by the potential problem of unaccept ably high error rates of /; 
classification. 

Based upon the results presented in the previous section, it is clear 
that the contrasting groups technique shows the greatest average change in 

error rates across repeated samplings (see Tables 4b and 5b) . A brief exam- . 

I / \ 

ination of how the standards are computed , for each of the techniques studied 
should shed some light on this phenomenon. Both the QDF and LDF procedures 
take into account the sample statistics and prior membership probabilities of 
the groups involved. These statistics remained "relatively" stable/ across 
individual members of a pair as well as across simulated pairs. Although the 
empirical solution did not incorporate the criterion of prior membership 
probabilities, (LN(Qi/Q2) was approximately zerd for this study), the 
sample statistics for the master/ nonmasters groups were, as noted for the QDF 
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and LDF techniques, essentially stable. The parent population mean was 
"fixed" as the standard across simulations, and since it was not conceived as 
an estimate of a standard in terms of a minimum error rate, the errors of 
classification were free to vary, and yet, reflect good agreement with the 
LDF, QDF and empirical solutions. This could be a function of the overall 
parent population characteristics, however, the contrasting groups^ procedure 
does not exhibit a similar agreement across repeated samplings. 

The derived standard of the contrasting groups procedure, although based 
upon the minimum number of classification errors, does not take into account 
master/nonraaster sample statistics, or prior membership probabilities. Each 
standard is established for- a given masters/nonmasters frequency distribu- 
tion, shifts in the score distributions across simulated pairs can result in 
potentially large differences in the derived standard, as well as potentially 
large changes in the accompanying error rates. 

As noted by Divgi (1982), "Standards are set because they have to be. 
In situations where it is believed (at least by those in' authority) that 
imperfect standards are better than none. No standard can satisfy everybody. 
One can only ask that the standard be reasonable, and that those who set it 
be aware of what theya re doing and why." From a strictly psychometric view- 
.,.point and consistent with Divgi's statement, we would argue against making 
decisions concerning competency, based upon a single test administration , and 
opt for a more carefully delineated school district testing program. Deci- 
sions regarding standard setting, competency and remediation should be based 
upon a combination of a student's logitudinal history of testing and class- 
room performance tempered by teacher input. 



Reference Notes 

Samuel Livingston, via a telephone conversation, noted similar trends in 
district results associated with the Basic. Skills Assessment Tests 
developed by ETS, and employed as. competency tests in reading and 
mathematics. 

Initially, if several different standards resulted in an equal number of 
minimum errors of classification, the lowest score was labeled as a Type 
I estimate, more false masters; whereas, the higher score was designated 
as a Type II estimate, more false nonmasters. Subsequent analyses pro- 
duced very similar results for these two classifications, consequent- 
ly, it was decided to present only the Type 1 findings in the report. 



/ 

/ 
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